XR Adaptive Modality: Experiment Report

Author

Mohammad Dastgheib

Published

December 13, 2025

Data Quality & Coverage Gate (Interim N)

Note

This section prevents misleading models when cells are missing. Current data are interim (N=26); treat all effects as descriptive.

Interim Status: All inferential models in this document are fit on N=26 (interim sample), not N=48. Any statements about N=48 refer to the planned final sample, not the data used here.

Participant counts (interim). Note: 'df' refers to correct trials, RT ∈ [150, 6000] ms, non-practice.
Metric Count
Total participants (raw) 26
Participants with any valid trials 26
Participants in df (correct, RT-filtered) 26
Condition coverage (modality × ui_mode × pressure)
modality ui_mode pressure trials pids missing_cell Status
hand static 0 675 25 FALSE OK
hand static 1 702 26 FALSE OK
hand adaptive 0 675 25 FALSE OK
hand adaptive 1 702 26 FALSE OK
gaze static 0 645 24 FALSE OK
gaze static 1 701 26 FALSE OK
gaze adaptive 0 700 26 FALSE OK
gaze adaptive 1 675 25 FALSE OK
All factors have ≥2 levels in the interim data.
Blocks logged per participant
pid blocks_logged
P002 7
P003 8
P004 8
P006 8
P007 8
P008 8
P009 8
P010 8
P011 8
P014 8
P015 8
P018 8
P019 8
P020 8
P022 8
P023 8
P024 8
P025 8
P029 8
P037 4
P038 8
P039 8
P040 8
P041 8
P042 8
P049 8
Interim Report Status

Current dataset: N=26 participants (interim sample). All inferential results in this document are preliminary and will be re-estimated at N≈48. Sections 16 (LBA) and 17 (Control Theory) document planned analyses that have not yet been implemented at the current interim N.

1. Executive Summary

This report analyzes 26 participants performing Fitts’ law pointing tasks across two input modalities (Hand, Gaze) and two UI modes (Static, Adaptive).

Note on Participant Exclusions: Seven participants (P002, P003, P007, P008, P015, P039, P040) were excluded from the planned 2×2×2 factorial dataset due to a data logging error that incorrectly recorded pressure conditions. The bug was fixed on December 8, 2025 (commit 04758db), and seven replacement participants (P049-P055) will be collected to reach the planned final sample of N=48. All analyses in this interim report use N=26 participants with complete data across all conditions.

All statistical models in this document are fit on N=26 (interim sample), not N=48. See the Data Quality Notes section and EXCLUSION_CRITERIA.md for details.

Results Snapshot (Interim, N = 26)

RQ1 contrasts (adaptive - static; interim descriptive)
modality tp_diff_adapt_static rt_diff_adapt_static err_diff_adapt_static
hand 0.0343380 -0.0174301 0.0014524
gaze -0.1029755 0.0494700 -0.0087236

*Note:* These contrasts are descriptive only; no inferential claims are made at N=`r n_distinct(df$pid)`.
RQ2 snapshot: Overall TLX (interim)
modality ui_mode Mean_Overall_TLX
gaze adaptive 48.4
gaze static 46.9
hand adaptive 41.9
hand static 42.2
RQ3 manipulation check: width scaling (interim)
modality ui_mode Mean_Width_Scale Pct_Scaled
gaze adaptive 1 0
gaze static 1 0
hand adaptive 1 0
hand static 1 0

*Note:* In the current build, width scaling was not activated; all recorded `width_scale_factor` values equal 1.0. RQ3 will be revisited once adaptive width scaling is enabled.

Key Findings

  • Total Trials Analyzed: 4734 valid trials (correct responses, RT 150-6000ms)
  • Total Trials Collected: 5481
  • Overall Error Rate: 14%
  • Mean Throughput: 3.3 bits/s (SD = 1.05)
  • Mean Movement Time: 1.177s (SD = 0.47s)

2. Demographics

Sample Size: N = 26 participants.

Overall Demographics

N Mean Age SD Age Age Range Mean Gaming (Hrs/Week) SD Gaming
26 30.9 7.4 18 - 54 0.7 2

By Gender

gender Count Avg Age SD Age Avg Gaming (Hrs)
female 9 32.1 8.8 0
male 17 30.3 6.7 1

Input Device Distribution

input_device Count Percentage
mouse 26 100

Gaming Status

Participants were primarily non-gamers (median self-reported gaming =  0  hours/week; only  3.8 % reported ≥5 hrs/week).

3. Primary Analysis: Throughput

Research Question: Does the Adaptive UI improve performance (Throughput) compared to Static, especially for Gaze?

Sample Size: N = 26 participants with valid throughput data.

Interim Analysis Note: At this interim N, we observe a large main effect of modality (hand > gaze) on throughput, but no reliable evidence that the Adaptive UI improves TP relative to Static, nor clear interactions with pressure. Interaction effects are treated as exploratory and will be revisited at N=48.

Summary Statistics

Throughput (bits/s) by Condition (N = 26 participants)
modality ui_mode pressure N_participants N_observations Mean SD Median Q25 Q75
hand static 0 25 75 3.50 0.90 3.44 2.89 3.99
hand static 1 26 78 3.53 0.96 3.51 2.90 4.10
hand adaptive 0 25 75 3.64 0.97 3.66 2.95 4.32
hand adaptive 1 26 78 3.46 0.93 3.42 2.69 4.11
gaze static 0 23 69 3.28 1.27 3.08 2.48 4.06
gaze static 1 25 74 2.95 0.88 2.95 2.36 3.38
gaze adaptive 0 26 77 3.05 1.13 2.82 2.32 3.76
gaze adaptive 1 25 75 2.96 1.06 2.69 2.26 3.37

Visualizations

Throughput by Modality and UI Mode (participant-level means). N = 26 participants. Raincloud plot: mirrored half-violins (Static←left, Adaptive→right) with boxplots inside, individual points in columns, connecting lines show paired comparisons.

Estimated Marginal Means for Throughput. N = 26 participants (shown only when model fits and factors exist).

Statistical Model Results

Planned Sample Size & Power

The throughput analysis was designed for a within-subjects 2×2×2 factorial (modality × UI mode × pressure). Our primary effect of interest is the UI mode main effect (adaptive vs static), which we expect to be medium in size (dz ≈ 0.4–0.6). Standard repeated-measures power calculations and guidelines (Cohen, 1988; Brysbaert, 2019) indicate that N ≈ 50 participants is sufficient for 80% power to detect dz ≈ 0.40. We therefore set N = 48 (six complete Williams sequences) as the primary design target, with the option to extend to N = 64 (eight sequences) if recruitment permits. Given the large number of trials per condition and the mixed-effects model (random intercepts per participant), this sample size is expected to provide high power for UI mode and modality main effects, while interactions are treated as secondary and more exploratory (Kumle et al., 2021; Matuschek et al., 2017).

### Model:  TP ~ modality * ui_mode * pressure + (1 | pid) 

**Model Structure:** Random intercept only (participant-level random effects). The trial-level N (≈ 601 ) should not be mistaken for independent units in discussions of power; the effective N for inference is the number of participants ( 26 ).

**Data Summary:**  26  participants,  601  trials,  8  conditions, minimum  69  trials per condition.

#### ANOVA Table
Type III Analysis of Variance Table with Satterthwaite's method
                          Sum Sq Mean Sq NumDF  DenDF F value    Pr(>F)    
modality                  33.741  33.741     1 575.49 53.8459 7.422e-13 ***
ui_mode                    0.205   0.205     1 575.50  0.3275   0.56738    
pressure                   2.735   2.735     1 575.82  4.3651   0.03712 *  
modality:ui_mode           0.807   0.807     1 575.50  1.2880   0.25689    
modality:pressure          0.884   0.884     1 575.78  1.4112   0.23534    
ui_mode:pressure           0.001   0.001     1 575.84  0.0015   0.96892    
modality:ui_mode:pressure  1.624   1.624     1 575.84  2.5916   0.10798    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

**Interim Analysis Note:** At N= 26 , the model is underpowered for detecting 3-way interactions. Any non-significant interaction effects should be treated as exploratory and will be revisited at N=48.


#### Model Summary
Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
  method [lmerModLmerTest]
Formula: formula_tp$formula
   Data: df_iso
Control: lmerControl(optimizer = "bobyqa")

      AIC       BIC    logLik -2*log(L)  df.resid 
   1515.4    1559.3    -747.7    1495.4       591 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.6783 -0.7692  0.0250  0.6116  4.3216 

Random effects:
 Groups   Name        Variance Std.Dev.
 pid      (Intercept) 0.3871   0.6222  
 Residual             0.6266   0.7916  
Number of obs: 601, groups:  pid, 26

Fixed effects:
                                        Estimate Std. Error        df t value
(Intercept)                              3.48043    0.15269  55.17874  22.793
modalitygaze                            -0.22036    0.13244 575.66759  -1.664
ui_modeadaptive                          0.14328    0.12927 575.11157   1.108
pressure1                                0.04843    0.12829 575.73630   0.377
modalitygaze:ui_modeadaptive            -0.35561    0.18469 575.71842  -1.925
modalitygaze:pressure1                  -0.36256    0.18438 575.25656  -1.966
ui_modeadaptive:pressure1               -0.21369    0.18105 575.11157  -1.180
modalitygaze:ui_modeadaptive:pressure1   0.41727    0.25920 575.84461   1.610
                                       Pr(>|t|)    
(Intercept)                              <2e-16 ***
modalitygaze                             0.0967 .  
ui_modeadaptive                          0.2682    
pressure1                                0.7060    
modalitygaze:ui_modeadaptive             0.0547 .  
modalitygaze:pressure1                   0.0497 *  
ui_modeadaptive:pressure1                0.2384    
modalitygaze:ui_modeadaptive:pressure1   0.1080    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) mdltyg u_mddp prssr1 mdlt:_ mdlt:1 u_md:1
modalitygaz -0.413                                          
ui_modedptv -0.423  0.488                                   
pressure1   -0.430  0.492  0.504                            
mdltygz:_md  0.294 -0.717 -0.700 -0.350                     
mdltygz:pr1  0.297 -0.716 -0.351 -0.693  0.514              
u_mddptv:p1  0.302 -0.348 -0.714 -0.706  0.500  0.491       
mdltygz:_:1 -0.207  0.510  0.499  0.488 -0.713 -0.712 -0.698

#### Effect Size: Hand vs. Gaze (Collapsed Over UI Mode and Pressure)


Table: Estimated Marginal Means for Throughput by Modality (collapsed over UI mode and pressure)

|Modality | Mean TP (bits/s)| 95% CI Lower| 95% CI Upper|
|:--------|----------------:|------------:|------------:|
|Hand     |             3.52|         3.25|         3.79|
|Gaze     |             3.05|         2.78|         3.32|

**Difference (Hand - Gaze):**  0.48  bits/s


#### Pairwise Comparisons (Holm-adjusted)
 contrast                                            estimate        SE     df
 hand static pressure0 - gaze static pressure0      0.2203574 0.1332586 582.65
 hand static pressure0 - hand adaptive pressure0   -0.1432760 0.1300617 582.09
 hand static pressure0 - gaze adaptive pressure0    0.4326909 0.1295218 582.76
 hand static pressure0 - hand static pressure1     -0.0484280 0.1290894 582.73
 hand static pressure0 - gaze static pressure1      0.5344853 0.1310262 583.10
 hand static pressure0 - hand adaptive pressure1    0.0219822 0.1290894 582.73
 hand static pressure0 - gaze adaptive pressure1    0.5432359 0.1300617 582.09
 gaze static pressure0 - hand adaptive pressure0   -0.3636334 0.1332586 582.65
 gaze static pressure0 - gaze adaptive pressure0    0.2123335 0.1327391 583.29
 gaze static pressure0 - hand static pressure1     -0.2687854 0.1323064 583.25
 gaze static pressure0 - gaze static pressure1      0.3141279 0.1338097 583.03
 gaze static pressure0 - hand adaptive pressure1   -0.1983752 0.1323064 583.25
 gaze static pressure0 - gaze adaptive pressure1    0.3228785 0.1332586 582.65
 hand adaptive pressure0 - gaze adaptive pressure0  0.5759669 0.1295218 582.76
 hand adaptive pressure0 - hand static pressure1    0.0948481 0.1290894 582.73
 hand adaptive pressure0 - gaze static pressure1    0.6777613 0.1310262 583.10
 hand adaptive pressure0 - hand adaptive pressure1  0.1652582 0.1290894 582.73
 hand adaptive pressure0 - gaze adaptive pressure1  0.6865119 0.1300617 582.09
 gaze adaptive pressure0 - hand static pressure1   -0.4811188 0.1279659 582.11
 gaze adaptive pressure0 - gaze static pressure1    0.1017944 0.1298916 582.44
 t.ratio p.value
   1.654  1.0000
  -1.102  1.0000
   3.341  0.0160
  -0.375  1.0000
   4.079  0.0011
   0.170  1.0000
   4.177  0.0008
  -2.729  0.1048
   1.600  1.0000
  -2.032  0.5545
   2.348  0.2692
  -1.499  1.0000
   2.423  0.2355
   4.447  0.0003
   0.735  1.0000
   5.173  <.0001
   1.280  1.0000
   5.278  <.0001
  -3.760  0.0036
   0.784  1.0000

Degrees-of-freedom method: kenward-roger 
P value adjustment: holm method for 28 tests 

4. Movement Time Analysis

Research Question: How does movement time vary across conditions?

Sample Size: N = 26 participants with valid movement time data (correct trials only).

Relationship to Throughput: The RT patterns mirror throughput: hand is faster than gaze. Adaptive vs static and pressure do not show robust main effects on movement time at this N, consistent with the TP results.

Summary Statistics

Movement Time (s) by Condition (N = 26 participants)
modality ui_mode pressure N_participants N_trials Mean SD Median
hand static 0 25 638 1.115 0.382 1.055
hand static 1 26 672 1.104 0.318 1.069
hand adaptive 0 25 644 1.067 0.330 1.018
hand adaptive 1 26 664 1.111 0.303 1.064
gaze static 0 24 492 1.198 0.483 1.107
gaze static 1 25 538 1.259 0.535 1.120
gaze adaptive 0 26 557 1.328 0.706 1.132
gaze adaptive 1 25 529 1.298 0.567 1.153

Visualizations

Movement Time by Modality and UI Mode (participant-level means). N = 26 participants. Raincloud plot: mirrored half-violins (Static←left, Adaptive→right) with boxplots inside, individual points in columns, connecting lines show paired comparisons.

Estimated Marginal Means for Movement Time. N = 26 participants (shown only when model fits and factors exist).

Statistical Model Results

Planned Sample Size & Power

The log-RT analysis uses the same 2×2×2 within-subjects design and random-intercept LMM as the throughput analysis. Because throughput and RT are mathematically coupled (TP = ID/RT) and we expect similar medium-sized UI mode and modality effects, the sample-size logic is identical: N = 48 is sufficient for detecting dz ≈ 0.40–0.50 differences with ≈0.80 power, and N = 64 further strengthens power for smaller effects and interactions (Cohen, 1988). Trial-level modeling with many repeated observations per participant increases precision, but our power planning is intentionally conservative and based on participant-level effects rather than naïvely counting trials.

### Model:  log_rt ~ modality * ui_mode * pressure + (1 | pid) 

**Model Structure:** Random intercept only (participant-level random effects). The trial-level N (≈ 4734 ) should not be mistaken for independent units in discussions of power; the effective N for inference is the number of participants ( 26 ).

**Data Summary:**  26  participants,  4734  trials,  8  conditions, minimum  492  trials per condition.

#### ANOVA Table
Type III Analysis of Variance Table with Satterthwaite's method
                           Sum Sq Mean Sq NumDF  DenDF  F value    Pr(>F)    
modality                  11.2853 11.2853     1 4709.0 145.7241 < 2.2e-16 ***
ui_mode                    0.2755  0.2755     1 4708.5   3.5579 0.0593237 .  
pressure                   0.6969  0.6969     1 4709.1   8.9993 0.0027150 ** 
modality:ui_mode           1.0035  1.0035     1 4708.6  12.9582 0.0003218 ***
modality:pressure          0.0398  0.0398     1 4709.0   0.5140 0.4734569    
ui_mode:pressure           0.0091  0.0091     1 4709.3   0.1173 0.7319753    
modality:ui_mode:pressure  0.8369  0.8369     1 4709.2  10.8063 0.0010190 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#### Pairwise Comparisons (Holm-adjusted)
 contrast                                             estimate         SE  df
 hand static pressure0 - gaze static pressure0     -0.03657564 0.01675577 Inf
 hand static pressure0 - hand adaptive pressure0    0.03803403 0.01554896 Inf
 hand static pressure0 - gaze adaptive pressure0   -0.11097521 0.01619107 Inf
 hand static pressure0 - hand static pressure1      0.00540702 0.01542684 Inf
 hand static pressure0 - gaze static pressure1     -0.09656918 0.01637813 Inf
 hand static pressure0 - hand adaptive pressure1   -0.00465645 0.01546792 Inf
 hand static pressure0 - gaze adaptive pressure1   -0.11167748 0.01637919 Inf
 gaze static pressure0 - hand adaptive pressure0    0.07460967 0.01671276 Inf
 gaze static pressure0 - gaze adaptive pressure0   -0.07439957 0.01730357 Inf
 gaze static pressure0 - hand static pressure1      0.04198266 0.01660071 Inf
 gaze static pressure0 - gaze static pressure1     -0.05999354 0.01743526 Inf
 gaze static pressure0 - hand adaptive pressure1    0.03191918 0.01664235 Inf
 gaze static pressure0 - gaze adaptive pressure1   -0.07510184 0.01748569 Inf
 hand adaptive pressure0 - gaze adaptive pressure0 -0.14900924 0.01615716 Inf
 hand adaptive pressure0 - hand static pressure1   -0.03262701 0.01539023 Inf
 hand adaptive pressure0 - gaze static pressure1   -0.13460321 0.01633408 Inf
 hand adaptive pressure0 - hand adaptive pressure1 -0.04269049 0.01543168 Inf
 hand adaptive pressure0 - gaze adaptive pressure1 -0.14971151 0.01634394 Inf
 gaze adaptive pressure0 - hand static pressure1    0.11638223 0.01595706 Inf
 gaze adaptive pressure0 - gaze static pressure1    0.01440603 0.01684592 Inf
 z.ratio p.value
  -2.183  0.2614
   2.446  0.1444
  -6.854  <.0001
   0.350  1.0000
  -5.896  <.0001
  -0.301  1.0000
  -6.818  <.0001
   4.464  0.0001
  -4.300  0.0003
   2.529  0.1258
  -3.441  0.0075
   1.918  0.3858
  -4.295  0.0003
  -9.222  <.0001
  -2.120  0.2721
  -8.241  <.0001
  -2.766  0.0680
  -9.160  <.0001
   7.293  <.0001
   0.855  1.0000

Degrees-of-freedom method: asymptotic 
P value adjustment: holm method for 28 tests 

5. Fitts’ Law Modelling

Research Question: How well does the data fit Fitts’ Law? (Linearity check).

Planned Sample Size & Power

Fitts’ law analyses serve primarily to validate the pointing task and modality differences, not to test the core adaptation hypotheses. The ID effect on movement time is typically very large (R² > .70), and robust Fitts-law slopes are observable with as few as 10–20 participants in classic HCI work. In this study, any final sample N ≥ 30 is more than sufficient for stable ID slopes; our planned N = 48 places this analysis in an over-powered, descriptive regime. We therefore do not perform formal power calculations here and treat Fitts regression as a manipulation check and descriptive characterization of the dataset.

Sample Size: N = 26 participants with valid throughput data.

Flatter slopes indicate less sensitivity to difficulty (ballistic movement).

Fitts Law Regression (Movement Time vs Effective Index of Difficulty). N = 26 participants. The effective index of difficulty (IDe) is calculated using the effective target width (We) derived from the spatial distribution of selection endpoints. Shaded regions around regression lines represent 95% confidence intervals. Linear regression fits are shown separately for each modality and UI mode combination.

### Model Fit Statistics
Linear Regression: MT ~ IDe (N = 26 participants)
modality ui_mode r_squared slope intercept
hand static 0.536 0.147 0.533
hand adaptive 0.538 0.133 0.570
gaze static 0.253 0.179 0.595
gaze adaptive 0.228 0.214 0.538

6. Error Rate Analysis

Research Question: How do error rates differ across conditions?

Sample Size: N = 26 participants with all trials (correct + incorrect).

Error Rates by Condition (N = 26 participants)
modality ui_mode pressure Participants Mean_Error_Rate SD_Error_Rate
hand static 0 25 5.48 7.02
hand static 1 26 4.27 5.99
hand adaptive 0 25 4.59 7.27
hand adaptive 1 26 5.41 6.64
gaze static 0 24 23.69 18.44
gaze static 1 25 20.16 12.51
gaze adaptive 0 26 20.43 12.49
gaze adaptive 1 25 21.63 14.21

Error Rate by Modality and UI Mode (participant-level means). N = 26 participants. Raincloud plot: mirrored half-violins (Static←left, Adaptive→right) with boxplots inside, individual points in columns, connecting lines show paired comparisons.


**Error Rate Summary:** Overall error rate was  13.5 %. Errors were concentrated in gaze conditions ( 22.2 %), while hand remained near  4.9 %. At this interim N and low error count, we report these differences descriptively.

Error Rate by Modality and UI Mode (participant-level means). N = 26 participants. Raincloud plot: mirrored half-violins (Static←left, Adaptive→right) with boxplots inside, individual points in columns, connecting lines show paired comparisons.

Statistical Model Results

Planned Sample Size & Power

For the error-rate analysis we fit a binomial GLMM with random intercepts per participant. The key contrasts are again UI mode and modality, where we expect odds-ratio effects in the small-to-medium range (e.g., OR ≈ 0.7–0.8 for adaptive vs static, and OR ≈ 2–3 for gaze vs hand). Binary outcomes with relatively low error rates (≈10–15%) typically require more participants than continuous outcomes for stable mixed-effects estimation (Kumle et al., 2021). For this analysis, we therefore treat N = 64 as a “good” target that yields comfortable power for medium effects, while N = 48 remains adequate but somewhat less stable, especially for interaction terms and rare error types. Error-based interaction effects are interpreted as exploratory, even at N = 64.

### Model:  error ~ modality * ui_mode * pressure + (1 | pid) 

**Model Structure:** Random intercept only (participant-level random effects). The trial-level N (≈ 5475 ) should not be mistaken for independent units in discussions of power; the effective N for inference is the number of participants ( 26 ).

**Data Summary:**  26  participants,  5475  trials,  8  conditions, minimum  645  trials per condition.
**Overall Error Rate:**  13.5 %
#### ANOVA Table
Analysis of Deviance Table (Type III Wald chisquare tests)

Response: error
                             Chisq Df Pr(>Chisq)    
(Intercept)               187.4919  1     <2e-16 ***
modality                   82.2650  1     <2e-16 ***
ui_mode                     0.5819  1     0.4456    
pressure                    0.9846  1     0.3211    
modality:ui_mode            0.0000  1     0.9965    
modality:pressure           0.6884  1     0.4067    
ui_mode:pressure            1.5809  1     0.2086    
modality:ui_mode:pressure   0.8457  1     0.3578    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#### Pairwise Comparisons (Holm-adjusted)
 contrast                                          odds.ratio        SE  df
 hand static pressure0 / gaze static pressure0       0.166696 0.0329272 Inf
 hand static pressure0 / hand adaptive pressure0     1.212575 0.3063988 Inf
 hand static pressure0 / gaze adaptive pressure0     0.202389 0.0400403 Inf
 hand static pressure0 / hand static pressure1       1.287513 0.3279074 Inf
 hand static pressure0 / gaze static pressure1       0.169015 0.0331117 Inf
 hand static pressure0 / hand adaptive pressure1     0.996022 0.2398462 Inf
 hand static pressure0 / gaze adaptive pressure1     0.190235 0.0375784 Inf
 gaze static pressure0 / hand adaptive pressure0     7.274181 1.5306456 Inf
 gaze static pressure0 / gaze adaptive pressure0     1.214121 0.1675520 Inf
 gaze static pressure0 / hand static pressure1       7.723729 1.6435371 Inf
 gaze static pressure0 / gaze static pressure1       1.013914 0.1369220 Inf
 gaze static pressure0 / hand adaptive pressure1     5.975089 1.1691062 Inf
 gaze static pressure0 / gaze adaptive pressure1     1.141213 0.1570270 Inf
 hand adaptive pressure0 / gaze adaptive pressure0   0.166908 0.0351470 Inf
 hand adaptive pressure0 / hand static pressure1     1.061800 0.2811833 Inf
 hand adaptive pressure0 / gaze static pressure1     0.139385 0.0291040 Inf
 hand adaptive pressure0 / hand adaptive pressure1   0.821410 0.2064080 Inf
 hand adaptive pressure0 / gaze adaptive pressure1   0.156885 0.0330059 Inf
 gaze adaptive pressure0 / hand static pressure1     6.361580 1.3526089 Inf
 gaze adaptive pressure0 / gaze static pressure1     0.835101 0.1125901 Inf
 null z.ratio p.value
    1  -9.070  <.0001
    1   0.763  1.0000
    1  -8.075  <.0001
    1   0.992  1.0000
    1  -9.074  <.0001
    1  -0.017  1.0000
    1  -8.401  <.0001
    1   9.430  <.0001
    1   1.406  1.0000
    1   9.607  <.0001
    1   0.102  1.0000
    1   9.136  <.0001
    1   0.960  1.0000
    1  -8.502  <.0001
    1   0.226  1.0000
    1  -9.437  <.0001
    1  -0.783  1.0000
    1  -8.804  <.0001
    1   8.702  <.0001
    1  -1.337  1.0000

P value adjustment: holm method for 28 tests 
Tests are performed on the log odds ratio scale 

7. Accuracy & Gaze Dynamics

Sample Size: N = 26 participants with valid accuracy data.

Effective Width (\(W_e\))

Planned Sample Size & Power

Effective width (We) is analyzed at the participant × condition level with a Gaussian LMM. We expect medium effects of modality (gaze > hand) and small-to-medium effects of UI mode (adaptive slightly improving spatial precision). For within-subject effects of this magnitude, N ≈ 48 is sufficient for ≈0.80 power (dz ≈ 0.4–0.5) according to standard repeated-measures power guidelines (Cohen, 1988). We therefore treat N = 48 as a good target for We, with N = 64 mainly helping if UI-mode effects turn out closer to dz ≈ 0.3.

Lower \(W_e\) indicates tighter shot grouping (higher precision).

Effective Width (px) by Condition (N = 26 participants)
modality ui_mode pressure N_participants Mean_We SD_We
hand static 0 25 32.67 20.16
hand static 1 26 32.80 21.69
hand adaptive 0 25 33.06 21.49
hand adaptive 1 26 33.22 20.66
gaze static 0 23 35.72 20.59
gaze static 1 25 36.31 20.45
gaze adaptive 0 26 35.39 20.02
gaze adaptive 1 25 36.80 19.32

Effective target width was broadly similar between Static and Adaptive within each modality; gaze showed slightly larger We overall, consistent with higher variability in endpoint location.

paste0(“Effective Target Width (Accuracy) by Modality and UI Mode.”, get_n_string(df_iso %>% group_by(pid, modality, ui_mode, pressure) %>% summarise(We_mean = mean(We, na.rm = TRUE), .groups = "drop") %>% filter(!is.na(We_mean))), ” participants. Raincloud plot: mirrored half-violins (Static←left, Adaptive→right) with boxplots inside, individual points in columns, connecting lines show paired comparisons. Lower values indicate tighter shot grouping and higher precision.”)

Endpoint Accuracy Scatter Plot

Visualization of endpoint errors relative to target center. Each point represents one trial’s endpoint position.

paste0(“Endpoint Accuracy Scatter Plot for Gaze Modality.”, get_n_string(df %>% filter(str_to_lower(modality) == "gaze", !is.na(endpoint_x), !is.na(endpoint_y))), ” participants. Each point represents one trial endpoint position relative to the target center (0,0). The red dashed circle shows the approximate target size. Points closer to the center indicate better accuracy. Dotted lines indicate zero error in X and Y directions. Faceted by pressure condition.”)
Endpoint Error Distance (px) for Gaze Modality
ui_mode pressure N Mean_Error SD_Error Median_Error
static 0 492 11.97 8.21 9.58
static 1 538 11.95 8.17 9.59
adaptive 0 557 11.72 7.96 9.61
adaptive 1 529 12.62 8.21 10.52

The “Midas Touch” Struggle

Planned Sample Size & Power

Target re-entries are count-like and somewhat noisy, but we again analyze participant-level averages with an LMM (or, if needed, a Poisson GLMM). We anticipate medium modality effects (more re-entries for gaze) and small-to-medium UI-mode effects (fewer re-entries under adaptation). Given the noisier nature of this metric, a slightly larger sample is desirable if you want to treat it as confirmatory. We therefore treat N = 48 as adequate but exploratory and N = 64 as a “good” sample size for detecting medium within-subject effects in re-entry counts. Power reasoning follows the same logic as other continuous repeated-measures outcomes, tempered by mixed-model guidance from Kumle et al. (2021).

Target Re-entries measure how often the cursor drifted out of the target before selection.

Re-entries are interpreted here as a proxy for control stability; higher counts suggest more corrective movements. We will revisit this metric in the control-theory analyses (Section 10).

Target Re-entries by Condition (N = 26 participants)
modality ui_mode pressure N_participants Mean_Reentries SD_Reentries
hand static 0 25 0.26 0.53
hand static 1 26 0.23 0.47
hand adaptive 0 25 0.23 0.46
hand adaptive 1 26 0.22 0.47
gaze static 0 23 2.15 1.21
gaze static 1 25 2.15 1.01
gaze adaptive 0 26 2.19 1.19
gaze adaptive 1 25 2.23 1.43

paste0(“Target Re-entries (Control Stability) by Modality and UI Mode.”, get_n_string(df_iso %>% filter(!is.na(reentries)) %>% group_by(pid, modality, ui_mode, pressure) %>% summarise(reentries_mean = mean(reentries, na.rm = TRUE), .groups = "drop") %>% filter(!is.na(reentries_mean))), ” participants. Raincloud plot: mirrored half-violins (Static←left, Adaptive→right) with boxplots inside, individual points in columns, connecting lines show paired comparisons. Lower values are better.”)

8. Workload (NASA-TLX)

Subjective workload scores (lower is better).

Sample Size: N = 26 participants with TLX data.

NASA-TLX Workload Scores by Modality and UI Mode. N = 26 participants. Scores range from 0-100, where lower values indicate lower subjective workload. The six TLX scales (Mental Demand, Physical Demand, Temporal Demand, Performance, Effort, Frustration) are shown separately. White diamonds show mean values. Violin plots show the distribution shape, boxplots show quartiles.

NASA-TLX Workload Components (Stacked Bar Chart). N = 26 participants. Total height represents overall workload, with each colored segment representing one of the six TLX scales (Mental Demand, Physical Demand, Temporal Demand, Performance, Effort, Frustration). Lower total height indicates lower overall subjective workload.

Statistical Model: Overall TLX

Planned Sample Size & Power

NASA-TLX scores (overall and subscales) are collected at the block level and analyzed with an LMM (random intercepts per participant; fixed effects for modality and UI mode). TLX scores tend to be reasonably reliable, and we expect medium effects for both modality (gaze > hand) and UI mode (adaptive < static), especially on Physical Demand and Frustration. For within-subject designs with medium effects, ≈40–50 participants typically provide ≥0.80 power (Brysbaert, 2019). We therefore treat N = 48 as a good, pre-planned N for TLX analyses. An increase to N = 64 would mostly refine confidence intervals and interaction estimates rather than change the main power conclusions.

### Model: overall_tlx ~ modality * ui_mode + (1 | pid)

**Data Summary:**  26  participants,  243  observations.

#### ANOVA Table
Type III Analysis of Variance Table with Satterthwaite's method
                  Sum Sq Mean Sq NumDF  DenDF F value   Pr(>F)    
modality         1458.47 1458.47     1 183.61 23.6453 2.48e-06 ***
ui_mode             2.38    2.38     1 183.60  0.0386   0.8444    
modality:ui_mode  115.18  115.18     1 183.63  1.8673   0.1735    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#### Estimated Marginal Means (Overall TLX by Modality × UI Mode)


Table: Estimated Marginal Means for Overall TLX by Condition (95% CI)

|Modality |UI Mode  | Mean TLX| 95% CI Lower| 95% CI Upper|
|:--------|:--------|--------:|------------:|------------:|
|Hand     |Static   |     42.7|         36.6|         48.8|
|Gaze     |Static   |     46.6|         40.5|         52.7|
|Hand     |Adaptive |     41.4|         35.1|         47.7|
|Gaze     |Adaptive |     48.3|         42.1|         54.5|


9. Learning Curves & Practice Effects

Research Question: How does performance change within each condition? Do learning rates differ by condition?

Sample Size: N = 26 participants with trial-level data.

Note: These learning curves serve as a quality check that participants improved modestly and reached a plateau; we do not treat these as primary inferential outcomes. This analysis is exploratory/QC only.

This section shows learning curves aligned by condition start (accounting for Williams counterbalancing). For block-level trends, see Section 12.

Learning Curve Data Summary by Condition (N = 26 participants)
Modality UI Mode Pressure N Positions Mean RT (s) Mean Error Rate
Hand Static OFF 27 1.114 0.0548
Hand Static ON 27 1.104 0.0427
Hand Adaptive OFF 27 1.067 0.0459
Hand Adaptive ON 27 1.111 0.0541
Gaze Static OFF 27 1.198 0.2373
Gaze Static ON 27 1.261 0.2325
Gaze Adaptive OFF 27 1.328 0.2045
Gaze Adaptive ON 27 1.302 0.2163

Learning Curves: Movement Time Within Condition. N = 26 participants. Learning aligned by position within condition (accounting for counterbalancing). LOESS smoothing. Lower is better. Shaded regions show 95% CI.
Error Rate Summary by Condition
Modality UI Mode Pressure N Positions Mean Error Rate Min Error Rate Max Error Rate
Hand Static OFF 27 5.48% 0.00% 12.00%
Hand Static ON 27 4.27% 0.00% 11.54%
Hand Adaptive OFF 27 4.59% 0.00% 16.00%
Hand Adaptive ON 27 5.41% 0.00% 23.08%
Gaze Static OFF 27 23.73% 8.33% 37.50%
Gaze Static ON 27 23.25% 7.69% 42.31%
Gaze Adaptive OFF 27 20.45% 3.85% 34.62%
Gaze Adaptive ON 27 21.63% 8.00% 40.00%

Learning Curves: Error Rate Within Condition. Learning aligned by position within condition (accounting for counterbalancing). LOESS smoothing. Lower is better. Shaded regions show 95% CI.

Note: Data aligned by position within condition to account for Williams counterbalancing. For block-level trends, see Section 12: Block Order & Temporal Effects.


10. Movement Quality Metrics

Submovement Analysis

Research Question: Does adaptive UI reduce movement corrections? How do submovements relate to performance?

Submovements indicate intermittent control - fewer submovements suggest smoother, more ballistic movements.

Planned Sample Size & Power

Submovement count is a noisier movement-quality metric and is currently based on pre-computed peaks. We anticipate small-to-medium effects of UI mode (adaptive reducing corrective movements) and medium effects of modality, but with considerable between-participant variability. For such count-based metrics, simulation-based power analysis is strongly recommended (e.g., using the approach in Kumle et al., 2021). As a rule of thumb, N = 64–72 would be needed to treat submovement differences as confirmatory (especially for UI-mode effects), whereas N = 48 is more appropriate for exploratory visualization and effect-size estimation rather than strict NHST.

Data Availability Note: Submovement metrics are available for a subset of the interim sample (see counts below). All results in this section are descriptive and should be treated as preliminary engineering diagnostics, not inferential findings. We distinguish between: - Participants with submovement_count (legacy, pre-computed): N = N = 3 - Participants with submovement_count_recomputed (from trajectory data): N = N = 16 - Participants with full trajectory JSON data: N = N = 16

Submovement Count by Condition (N = 16 participants, using submovement_count_recomputed)
modality ui_mode pressure N_participants N_trials Mean SD Median
hand static 0 15 392 0.00 0.00 0
hand static 1 16 414 0.00 0.00 0
hand adaptive 0 15 388 0.00 0.00 0
hand adaptive 1 16 410 0.00 0.00 0
gaze static 0 15 303 8.68 4.96 7
gaze static 1 15 324 9.14 5.14 8
gaze adaptive 0 16 347 9.71 6.18 8
gaze adaptive 1 15 320 9.09 5.75 7

ℹ **Note:** Hand modality shows zero submovements, indicating very smooth movements
   with no detected velocity peaks (submovements). This is valid data.

Submovements vs. Index of Difficulty. N = 19 participants. How movement corrections scale with task difficulty. Linear regression with 95% confidence intervals.

Verification Time Analysis

Research Question: How much time is spent “stopping” vs. “moving”? Does adaptive UI reduce verification time?

Sample Size: N = 26 participants with verification time data.

Planned Sample Size & Power

Verification time (from first target entry to final selection) is conceptually closer to a decision-phase measure and serves as a bridge to future LBA modeling. We again expect medium modality effects and small-to-medium UI-mode effects, and we analyze it via an LMM. Because this outcome is continuous and based on many trials per participant, N = 48 is a good target for medium effects, and N = 64 provides added stability for smaller UI-mode differences or more complex interaction patterns. The same repeated-measures power guidelines apply as for RT and TP (Cohen, 1988).

Verification time represents the “precise stopping” phase, separate from the ballistic movement phase.


11. Error Patterns & Types

Research Question: What types of errors occur? Do error patterns differ by condition?

Sample Size: N = 26 participants with error type data.


**Error Type Summary:** Gaze produced more misses ( 0 %) than hand ( 99.3 %), consistent with its lower throughput. Adaptive UI did not yet show a clear reduction in any specific error type at N= 26 .

12. Block Order & Temporal Effects

Research Question: Are there order effects? Does performance improve or degrade over blocks?

Sample Size: N = 26 participants with block-level data.

Note: This section is exploratory/QC only. These analyses serve as quality checks for temporal trends and are not treated as primary inferential outcomes.

Performance Across Blocks: Movement Time. Movement time by block number. Lower is better. Shaded regions show ±1 SE.
Block-Level Data Summary by Condition
Modality UI Mode Pressure N Blocks Mean Error Rate
Hand Static OFF 8 4.75%
Hand Static ON 8 3.35%
Hand Adaptive OFF 8 3.64%
Hand Adaptive ON 8 4.95%
Gaze Static OFF 8 22.96%
Gaze Static ON 8 21.06%
Gaze Adaptive OFF 8 19.74%
Gaze Adaptive ON 8 21.64%

Performance Across Blocks: Movement Time. Movement time by block number. Lower is better. Shaded regions show ±1 SE.

Performance Across Blocks: Error Rate. Error rate by block number. Lower is better. Shaded regions show ±1 SE.

13. Spatial Patterns & Heatmaps

Research Question: Are there spatial biases in performance? Do some screen regions show better/worse performance?

Sample Size: N = 26 participants with spatial position data.

Note: These spatial visualizations are exploratory and serve as descriptive quality checks. They are not treated as primary inferential outcomes. At N=26, interpretation is limited, but these plots may be useful for understanding XR-specific spatial patterns (e.g., top vs bottom of visual field).

Performance by Target Position

Error Density Heatmap

Where do endpoint errors occur? Are there systematic spatial biases?


14. Adaptive UI Mechanism Analysis

Width Scaling (Target Size Adaptation)

Research Question: Does the adaptive UI dynamically change target sizes? How does width scaling relate to performance?

Sample Size: N = 19 participants with width scaling data.

Status: In the current dataset, the width scaling mechanism was disabled/misconfigured; all recorded width_scale_factor values equal 1.0. Results here serve as a template for future analysis once scaling is active.

The adaptive UI may scale target widths based on performance. This section examines whether and how target sizes are adjusted.

**Note:** No target width scaling was observed in this dataset.
All `width_scale_factor` values are 1.0 (no scaling applied).

This indicates that the adaptive policy did not trigger during data collection.
Possible reasons:
- Hysteresis gate threshold not met (requires N consecutive slow/error trials)
- Performance thresholds (RT p75, error burst) not exceeded
- Adaptive policy not properly configured or enabled
- Participants performed well enough that adaptation was not needed
Target Width Scaling by Condition (N = 19 participants, No Scaling Observed)
modality ui_mode pressure N_participants N_trials Mean_Scale SD_Scale Mean_Diff SD_Diff Pct_Scaled
hand static 0 18 486 1 0 0 0 0
hand static 1 19 513 1 0 0 0 0
hand adaptive 0 18 486 1 0 0 0 0
hand adaptive 1 19 513 1 0 0 0 0
gaze static 0 18 486 1 0 0 0 0
gaze static 1 19 513 1 0 0 0 0
gaze adaptive 0 19 513 1 0 0 0 0
gaze adaptive 1 18 486 1 0 0 0 0

Target Width Scale Factor by Condition. N = 19 participants. Raincloud plot: mirrored half-violins (Static←left, Adaptive→right) with boxplots inside, individual points in columns, connecting lines show paired comparisons. Scale factor = 1.0 means no scaling (nominal size). Values > 1.0 indicate enlarged targets.

**Note:** All width scale factors are 1.0 (no scaling applied).
This indicates the adaptive policy did not trigger during data collection.

Width Scaling Over Time (by Trial Number). Shows how target scaling changes throughout the experiment. LOESS smoothing with 95% CI.

**Note:** All width scale factors remain at 1.0 throughout the experiment.
The adaptive policy did not trigger any target size adjustments.

Width Scale Factor vs. Movement Time. Does target scaling improve performance? Linear regression with 95% CI.

**Note:** All width scale factors are 1.0, so no relationship with performance can be assessed.
The plot shows all points at x=1.0, indicating no scaling occurred.

Alignment Gate Metrics

Research Question: If alignment gates are used, how do they affect performance? How often are false triggers detected?

Alignment gates may be used to ensure proper cursor alignment before selection. This section examines their usage and effectiveness.


**Alignment Gate Interpretation:** False triggers were  rare  (mean =  0.04  per trial).  Adaptive UI did not show a meaningful change in false trigger rate compared to Static at this interim N. 

Alignment Gate False Triggers by Condition. Raincloud plot: mirrored half-violins (Static←left, Adaptive→right) with boxplots inside, individual points in columns, connecting lines show paired comparisons. Lower is better.
ℹ **Note:** No recovery time data for gaze modality.
   This indicates the alignment gate always passed (no false triggers) for these trials.

Alignment Gate Recovery Time by Condition. Raincloud plot: mirrored half-violins (Static←left, Adaptive→right) with boxplots inside, individual points in columns, connecting lines show paired comparisons. Lower is better.
ℹ **Note:** No mean recovery time data for gaze modality.
   This indicates the alignment gate always passed (no false triggers) for these trials.

Alignment Gate Mean Recovery Time by Condition. Raincloud plot: mirrored half-violins (Static←left, Adaptive→right) with boxplots inside, individual points in columns, connecting lines show paired comparisons. Lower is better.

Task Type Analysis

Research Question: Are there different task types (point vs. drag)? How does performance differ across task types?

If the experiment includes different task types, this section examines performance differences.

Performance by Task Type
task_type modality ui_mode N_Trials Mean_RT SD_RT Error_Rate
drag hand static 666 1079.0 334.7 4.65
drag hand adaptive 666 1053.6 323.9 5.86
drag gaze static 618 1222.2 549.2 17.96
drag gaze adaptive 651 1268.7 583.9 20.28
point hand static 333 1071.9 306.6 4.80
point hand adaptive 333 1073.9 313.0 4.20
point gaze static 309 1223.2 536.9 22.98
point gaze adaptive 321 1288.5 554.2 17.13

Movement Time by Task Type. Raincloud plot: half-violins with boxplots inside, individual points. Comparison of performance across different task types (if multiple exist). Lower is better.

Planned Sample Size & Power

Path-length efficiency (actual path / straight-line amplitude) is analyzed at the trial level but interpreted as a within-subject continuous outcome, with expected medium modality differences (longer, less efficient paths for gaze) and small-to-medium UI-mode effects. We treat N = 48 as a reasonable “good N” for detecting medium effects (dz ≈ 0.4–0.5), and N = 64 as an ideal target if path efficiency becomes more central to the argument. At both Ns, this analysis is secondary to the core throughput and RT results.

Path Length and Efficiency Metrics by Condition
modality ui_mode pressure N_Trials Mean_Path_Length Mean_Amplitude Mean_Ratio Mean_Efficiency Mean_Excess Mean_RT
hand static 0 348 753.0 370.1 2.40 0.516 382.9 1076.5
hand static 1 367 753.5 372.2 2.39 0.513 381.3 1082.5
hand adaptive 0 347 755.9 372.3 2.43 0.519 383.5 1037.6
hand adaptive 1 367 760.0 374.4 2.40 0.513 385.6 1094.3
gaze static 0 267 688.1 358.1 2.11 0.543 330.1 1239.6
gaze static 1 294 702.2 364.6 2.12 0.541 337.6 1272.5
gaze adaptive 0 313 701.9 367.4 2.08 0.551 334.4 1349.7
gaze adaptive 1 291 767.3 362.2 2.28 0.508 405.1 1308.8

Path Length vs. Movement Time (Log-Log Scale). 2D density plot showing the relationship between actual cursor path length and movement time. GAM smooth captures nonlinearity. Log scales handle right-skewed distributions and heteroscedasticity.

Path Efficiency vs. Movement Time. Path efficiency (A / path length) indicates how straight the movement was. Higher efficiency (closer to 1.0) means straighter paths. This plot shows whether inefficient movements lead to longer movement times, and whether adaptive UI improves efficiency.
⚠ Cannot create ID bins: insufficient variation or invalid break points.
Skipping ID binning plot.

Individual Differences in Path Efficiency. Thin lines show per-participant mean efficiency by UI mode. Thick line and large point show condition mean. Shows whether adaptive UI consistently improves efficiency across participants.

15. Gaze-Specific Analysis: Hover/Dwell Time

Research Question: How does hover/dwell time vary across gaze conditions? Does adaptive UI affect dwell time before confirmation?

Planned Sample Size & Power

Hover/dwell time is modeled only for gaze trials with fixed effects for UI mode and pressure. Because this shrinks the effective dataset and the expected UI-mode effects may be small-to-medium (dz ≈ 0.3–0.5), we treat this analysis as exploratory unless N ≥ 64. At N = 48, the study is adequately powered for medium effects but underpowered for smaller ones; at N = 64, we expect ≈0.80 power even if the UI-mode effect is closer to dz ≈ 0.35, based on standard repeated-measures calculations and mixed-model heuristics (Cohen, 1988; Kumle et al., 2021).

Sample Size: N = 0 (no data) participants with gaze hover/dwell data.

Note: This analysis is exploratory. At this interim N, CIs are wide and results should be treated as preliminary. This analysis will be revisited at N=48.

Hover/dwell time represents the duration the cursor remains in the target before confirmation in gaze trials. This metric is specific to gaze modality and reflects the “Midas touch” problem—the need for deliberate confirmation to avoid unintended selections.

⚠ No valid hover/dwell time data available for gaze trials.

  • Hierarchical LBA (verification-time RTs); requires more data and packages (RWiener/rtdists).
  • Control-theory kinematics (velocity profiles, submovement decomposition) once trajectories vetted.
  • Identification checks: need ≥2 levels per factor and adequate trial counts per cell (~24+).
  • Error-type breakdowns and spatial heatmaps to remain exploratory/QC until full N.
  • Revisit hover/dwell and path-length efficiency once gaze data are richer.

Implementation Notes: - LBA requires RT data from the verification phase (time from target entry to selection) - Model fitting can be done using RWiener or rtdists packages - Key parameters to estimate: drift rate (v), threshold (b), starting point (A), non-decision time (t0) - Hypothesis: Adaptive conditions should show lower threshold (b), indicating less caution needed


16. Planned Advanced Analysis: Linear Ballistic Accumulator (LBA) – No models fit yet at current interim N

Research Question: Can we model the verification phase (time from target entry to selection) using LBA parameters? Do adaptive conditions show different decision thresholds?

Status: ⚠️ PLANNED ANALYSIS - No LBA models fit yet at current interim N (N=26).

Planned Sample Size & Power

The hierarchical LBA analysis will be run on verification-time RTs with parameters (v, b, A, t₀) varying by modality and UI mode. Power and parameter recovery in diffusion/accumulator models depend more on trials per participant than on sheer N, but group-level comparisons still require a sufficient number of participants. Studies on parameter recovery for DDM/LBA and related models generally recommend ≥100 trials per condition and at least 30–40 participants for stable hierarchical estimates. Your design (≈24 trials × 8 conditions ≈ 192 trials per participant) is already strong on the trial side. For group-level parameter differences, however, a target of N ≥ 64 is advisable; N = 48 is workable but will lead to wider credible intervals on parameter contrasts. Ideally, you would validate LBA power for your specific parameterization via simulation (e.g., using the approach described in Kumle et al., 2021, for mixed models).

This section documents the planned LBA analysis. No LBA models have been fit for the current interim dataset; this is included for transparency and to guide future work.

Linear Ballistic Accumulator models decompose reaction time into decision and non-decision components. For gaze-based interaction, we hypothesize that adaptive UI reduces decision threshold (b), indicating less caution needed when targets are easier to acquire.

⚠️ **LBA Analysis Not Yet Implemented**
This section will analyze verification-time RTs using hierarchical Linear Ballistic Accumulator models.
**Current Status:**
- Basic verification_time_ms analysis: ✅ DONE (Section 10)
- LBA model fitting: ❌ PENDING
- Hierarchical parameter estimation: ❌ PENDING
**Implementation Plan:**
- LBA requires RT data from the verification phase (time from target entry to selection)
- Model fitting can be done using `RWiener` or `rtdists` packages
- Key parameters to estimate: drift rate (v), threshold (b), starting point (A), non-decision time (t0)
- Hypothesis: Adaptive conditions should show lower threshold (b), indicating less caution needed
- Hierarchical modeling will account for participant-level variation
**Data Requirements:**
- Verification time data available:  2656  trials
- Requires sufficient trial counts per condition for stable parameter estimation
- Will be implemented once N reaches target sample size
**Power Considerations:**
- N=48 is sufficient for medium main effects (dz≈0.41, power≈0.80)
- LBA parameters require careful convergence diagnostics
- See `POWER_ANALYSIS_EXPERT_RESPONSE.md` for detailed recommendations

17. Planned Control Theory Analysis: Submovement Models – Trajectory-based metrics not yet implemented at current interim N

Research Question: How does the control loop efficiency differ across conditions? Do adaptive interventions reduce movement corrections?

Status: ⚠️ PLANNED ANALYSIS - No trajectory-based models fit yet at current interim N (N=26).

Planned Sample Size & Power

Trajectory-based kinematic metrics (velocity profiles, jerk, normalized jerk, primary vs corrective phases) are rich but correlated and often noisier than basic RT/TP measures. Because they are derived from the same trial-level data, their within-subject effect sizes are likely small-to-medium, with substantial individual differences. For these analyses, N = 48 is adequate for descriptive modeling and estimation, while N = 64 is a good target if you plan to make stronger inferential claims about UI-mode improvements in movement smoothness or control-loop efficiency. As with LBA, simulation-based power analyses tailored to your specific metrics would be ideal but are beyond the scope of this report (Kumle et al., 2021).

This section documents the planned control theory analysis. Submovement metrics in this report are limited to pre-computed submovement_count (see Section 10). Full trajectory-based control-theory models (jerk, duration-normalized jerk, primary vs corrective phases) will be implemented once trajectory logging is complete across participants. No trajectory-based models have been fit for the current interim dataset; this is included for transparency and to guide future work.

The Optimized Submovement Model [@meyer1988] posits that pointing movements are composed of a primary ballistic impulse followed by n corrective submovements. The Submovement Count (N_sub) serves as a proxy for the efficiency of the control loop. In gaze-based interaction, simulated lag and saccadic blindness force users into an intermittent control regime, theoretically increasing N_sub.

Power Analysis Summary: - N=48 is sufficient for medium main effects (dz≈0.41, power≈0.80) - Interactions will be underpowered unless large (treat as exploratory) - 60fps trajectory data improves measurement precision but doesn’t increase effective N - Key considerations: Use duration-normalized smoothness metrics, control for multiple comparisons (FDR), pre-specify outcomes - See POWER_ANALYSIS_EXPERT_RESPONSE.md for detailed recommendations

⚠️ **Advanced Control Theory Analysis Not Yet Implemented**
This section will analyze movement control using the Optimized Submovement Model.
**Current Status:**
- Basic submovement_count analysis: ✅ DONE (Section 10)
- Velocity profile analysis: ❌ PENDING
- Submovement detection algorithm: ❌ PENDING
- Primary vs. corrective movement decomposition: ❌ PENDING
✅ Submovement count data available:
   - N trials with submovement data: 530 
   - Mean submovements per trial: 7.47 
   - Range: 0 - 70 

**Submovement Count by Modality and UI Mode:**


|modality |ui_mode  | Mean_Submov| SD_Submov|
|:--------|:--------|-----------:|---------:|
|hand     |static   |        0.00|      0.00|
|hand     |adaptive |        0.00|      0.00|
|gaze     |static   |       17.79|      9.74|
|gaze     |adaptive |       15.76|      7.81|

**Data Quality Check:**
   ✅ Trajectory data available in CSV:
      - N trials with trajectory: 3348 
      - Trajectory stored as JSON string in 'trajectory' column
      - Can be parsed in R: jsonlite::fromJSON(trajectory)
   - Current analysis uses pre-calculated submovement_count from FittsTask.tsx

**Next Steps for Advanced Analysis:**
1. If cursor trajectory data is needed, add logging to FittsTask.tsx
2. Implement velocity profile extraction from trajectory
3. Detect submovements using zero-crossings in acceleration profile
4. Decompose primary vs. corrective movements
5. Compare control loop efficiency across conditions
6. Test hypothesis: Adaptive UI → fewer submovements (more ballistic)
⚠️ **Advanced Control Theory Metrics Not Yet Implemented**
**Planned Analyses:**
1. **Velocity Profile Analysis:**
   - Peak velocity extraction
   - Time to peak velocity (TPV)
   - Deceleration phase duration
   - Velocity profile asymmetry
2. **Submovement Detection:**
   - Zero-crossing detection in acceleration profile
   - Primary movement identification (first ballistic phase)
   - Corrective submovement count and duration
   - Inter-submovement intervals
3. **Control Loop Efficiency:**
   - Ratio of primary to total movement time
   - Correction frequency (submovements per second)
   - Movement smoothness metrics (jerk, normalized jerk - MUST be duration-normalized)
4. **Modality-Specific Patterns:**
   - Gaze: Intermittent control due to lag and saccadic blindness
   - Hand: Continuous control with proprioceptive feedback
   - Adaptive: Reduced corrections due to target expansion/declutter
**Data Requirements:**
✅ Trajectory data is now available in 'trajectory' column (JSON string, ~60fps)
✅ Current CSV has submovement_count (pre-calculated) AND raw trajectory

**Power & Analysis Considerations:**
- N=48 is sufficient for main effects (dz≈0.41, power≈0.80)
- Interactions: Underpowered, treat as exploratory
- 60fps improves measurement precision but doesn't increase effective N
- Use duration-normalized smoothness metrics
- Control for multiple comparisons (FDR) if testing many metrics
- Pre-specify theoretically motivated outcomes

**See POWER_ANALYSIS_EXPERT_RESPONSE.md for detailed recommendations**

Implementation Notes: - Basic submovement analysis is already in Section 10 (Movement Quality Metrics) - Trajectory data is now available in the trajectory column (JSON string, logged at ~60fps) - Current submovement_count is pre-calculated in FittsTask.tsx using velocity peaks - Power: N=48 sufficient for main effects (dz≈0.41, power≈0.80); interactions underpowered (treat as exploratory) - Key considerations: - Use duration-normalized smoothness metrics (jerk is duration-sensitive) - Control for multiple comparisons (FDR) if testing many kinematic features - Pre-specify a small set of theoretically motivated outcomes - 60fps improves measurement precision but doesn’t increase effective N - See POWER_ANALYSIS_EXPERT_RESPONSE.md for detailed power analysis and recommendations

Potential Issues to Check: - Verify that submovement_count calculation in FittsTask.tsx matches the Optimized Submovement Model definition - Check if velocity profile data is needed or if pre-calculated counts are sufficient - Ensure submovement detection algorithm handles both hand and gaze modalities correctly


18. Summary & Conclusions

Key Findings Summary

Summary of Key Metrics by Condition (Interim N=26)
modality ui_mode Metric Mean SD
hand static Effective Width (px) 32.740 20.890
hand adaptive Effective Width (px) 33.140 21.000
gaze static Effective Width (px) 36.020 20.450
gaze adaptive Effective Width (px) 36.090 19.620
hand static Error Rate (%) 4.870 21.520
hand adaptive Error Rate (%) 5.010 21.820
gaze static Error Rate (%) 23.480 42.400
gaze adaptive Error Rate (%) 21.020 40.760
hand static Movement Time (s) 1.109 0.350
hand adaptive Movement Time (s) 1.089 0.317
gaze static Movement Time (s) 1.230 0.511
gaze adaptive Movement Time (s) 1.313 0.642
hand static Throughput (bits/s) 3.510 0.930
hand adaptive Throughput (bits/s) 3.550 0.950
gaze static Throughput (bits/s) 3.110 1.100
gaze adaptive Throughput (bits/s) 3.000 1.100

Data Quality Notes

  • Participants: 26
  • Valid Trials: 4734 (out of 5481 total experimental trials)
  • Exclusion Rate: 14% (due to errors, timeouts, or invalid RTs)
  • Trials per Participant: Mean = 182.1, Range = 99 - 213

Participant Exclusions

Excluded Participants: Seven participants (P002, P003, P007, P008, P015, P039, P040) were excluded from the main 2×2×2 factorial analysis due to a data logging error.

Reason: A bug in the data logging code (fixed December 8, 2025, commit 04758db) incorrectly recorded all trials as pressure = 1 regardless of block condition. The bug was caused by passing the pressure value (always 1.0) instead of the pressure condition boolean (pressureEnabled) to the logging function in TaskPane.tsx line 1105.

Impact: - All 7 affected participants have only pressure = 1 data - Modality and UI Mode were logged correctly (0 mismatches) - Without both pressure conditions (0 and 1), these participants cannot contribute to the full factorial model

Resolution: - Bug fixed and deployed (commit 04758db) - Seven replacement participants (P049-P055) will be collected to maintain planned N=48 - Affected participants’ data retained for exploratory analyses

Current Sample (Interim): N=26 participants with complete data across all experimental conditions.

Planned Final Sample: N=48 participants with complete data across all experimental conditions (not yet achieved in this interim report).

Note: All summary statistics above are based on the current interim sample (N=26). Effect sizes and p-values may change as more data are collected.

For detailed exclusion criteria, see EXCLUSION_CRITERIA.md. For technical audit details, see AUDIT_REPORT.md.